Skip to content

Conversation

@raphaelgavache
Copy link
Member

@raphaelgavache raphaelgavache commented Oct 8, 2025

Add support for service discovery using JNA, a second version using JNI will be replacing this approach when available

Can be disabled with
env var: DD_TRACE_SERVICE_DISCOVERY_ENABLED=true
system arg: dd.trace.service.discovery.enabled=true

Test instructions

On system-tests

To test on system-tests

  1. go to the matching system-test branch https://github.com/DataDog/system-tests/pull/5502/files
  2. download from gitlab dd-trace-api-1.55.0-SNAPSHOT.jar and dd-java-agent-1.55.0-SNAPSHOT.jar
  3. add both jars in system-tests-binaries
./build.sh -i runner
 source venv/bin/activate
./build.sh java
./run.sh PARAMETRIC -L java tests/parametric/test_process_discovery.py::Test_ProcessDiscovery

On a linux VM

# install injector apm
DD_SITE="datadoghq.com" DD_APM_INSTRUMENTATION_ENABLED=host DD_API_KEY=x DD_APM_INSTRUMENTATION_LIBRARIES=java:1 bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"

# install java tracer commit
sudo datadog-installer install "oci://installtesting.datad0g.com/apm-library-java-package:34aa766f77fb84bd5c43ad4daaea3a149d339683"

Check file descriptors

$ java Sleep &
[1] 1346653
$ cat /proc/1346653/fd/11
schema_versiontracer_languagejavatracer_version1.55.0-SNAPSHOT~34aa766f7hostnameraphael-debian12
runtime_id$b27e9ae7-f5eb-4f84-b298-3d2c6c6cb414
                                               service_nameSleep
                                                                process_tags\entrypoint.name:sleep,entrypoint.type:class,entrypoint.workdir:raphael_gavach

Sleep is a sleep 50s java class for test purpose

Motivation

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@raphaelgavache raphaelgavache changed the title try to plug memfd to core-tracer support service discovery with JNA Oct 8, 2025
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Oct 8, 2025

🎯 Code Coverage
Patch Coverage: 37.96%
Total Coverage: 59.59% (-0.22%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: fa05b36 | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Oct 8, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master raphael/memfd
git_commit_date 1760635008 1760636118
git_commit_sha c85d09f fa05b36
release_version 1.55.0-SNAPSHOT~c85d09f004 1.55.0-SNAPSHOT~fa05b36659
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1760637950 1760637950
ci_job_id 1183514612 1183514612
ci_pipeline_id 79539409 79539409
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-j2ozgmme 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-j2ozgmme 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 56 metrics, 9 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.019 s) : 0, 1018865
Total [baseline] (10.685 s) : 0, 10685259
Agent [candidate] (1.025 s) : 0, 1024999
Total [candidate] (10.831 s) : 0, 10831201
section appsec
Agent [baseline] (1.195 s) : 0, 1194750
Total [baseline] (11.097 s) : 0, 11097424
Agent [candidate] (1.202 s) : 0, 1201617
Total [candidate] (10.839 s) : 0, 10839238
section iast
Agent [baseline] (1.158 s) : 0, 1157750
Total [baseline] (11.082 s) : 0, 11081535
Agent [candidate] (1.15 s) : 0, 1150424
Total [candidate] (11.125 s) : 0, 11124772
section profiling
Agent [baseline] (1.163 s) : 0, 1163190
Total [baseline] (11.023 s) : 0, 11023431
Agent [candidate] (1.172 s) : 0, 1171839
Total [candidate] (10.941 s) : 0, 10940510
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.019 s -
Agent appsec 1.195 s 175.885 ms (17.3%)
Agent iast 1.158 s 138.885 ms (13.6%)
Agent profiling 1.163 s 144.325 ms (14.2%)
Total tracing 10.685 s -
Total appsec 11.097 s 412.165 ms (3.9%)
Total iast 11.082 s 396.276 ms (3.7%)
Total profiling 11.023 s 338.172 ms (3.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.025 s -
Agent appsec 1.202 s 176.617 ms (17.2%)
Agent iast 1.15 s 125.425 ms (12.2%)
Agent profiling 1.172 s 146.84 ms (14.3%)
Total tracing 10.831 s -
Total appsec 10.839 s 8.037 ms (0.1%)
Total iast 11.125 s 293.57 ms (2.7%)
Total profiling 10.941 s 109.309 ms (1.0%)
gantt
    title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.483 ms) : 0, 1483
BytebuddyAgent [baseline] (694.199 ms) : 0, 694199
BytebuddyAgent [candidate] (697.693 ms) : 0, 697693
GlobalTracer [baseline] (242.065 ms) : 0, 242065
GlobalTracer [candidate] (243.979 ms) : 0, 243979
AppSec [baseline] (32.424 ms) : 0, 32424
AppSec [candidate] (32.831 ms) : 0, 32831
Debugger [baseline] (6.45 ms) : 0, 6450
Debugger [candidate] (6.442 ms) : 0, 6442
Remote Config [baseline] (686.49 µs) : 0, 686
Remote Config [candidate] (679.145 µs) : 0, 679
Telemetry [baseline] (9.295 ms) : 0, 9295
Telemetry [candidate] (9.546 ms) : 0, 9546
Flare Poller [baseline] (11.188 ms) : 0, 11188
Flare Poller [candidate] (11.103 ms) : 0, 11103
section appsec
crashtracking [baseline] (1.463 ms) : 0, 1463
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (718.095 ms) : 0, 718095
BytebuddyAgent [candidate] (722.883 ms) : 0, 722883
GlobalTracer [baseline] (234.267 ms) : 0, 234267
GlobalTracer [candidate] (236.465 ms) : 0, 236465
IAST [baseline] (24.798 ms) : 0, 24798
IAST [candidate] (24.999 ms) : 0, 24999
AppSec [baseline] (174.891 ms) : 0, 174891
AppSec [candidate] (175.459 ms) : 0, 175459
Debugger [baseline] (6.192 ms) : 0, 6192
Debugger [candidate] (6.053 ms) : 0, 6053
Remote Config [baseline] (636.521 µs) : 0, 637
Remote Config [candidate] (620.477 µs) : 0, 620
Telemetry [baseline] (8.554 ms) : 0, 8554
Telemetry [candidate] (8.527 ms) : 0, 8527
Flare Poller [baseline] (4.805 ms) : 0, 4805
Flare Poller [candidate] (3.935 ms) : 0, 3935
section iast
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (817.928 ms) : 0, 817928
BytebuddyAgent [candidate] (814.404 ms) : 0, 814404
GlobalTracer [baseline] (234.146 ms) : 0, 234146
GlobalTracer [candidate] (232.077 ms) : 0, 232077
IAST [baseline] (27.088 ms) : 0, 27088
IAST [candidate] (26.498 ms) : 0, 26498
AppSec [baseline] (35.59 ms) : 0, 35590
AppSec [candidate] (34.966 ms) : 0, 34966
Debugger [baseline] (6.24 ms) : 0, 6240
Debugger [candidate] (6.167 ms) : 0, 6167
Remote Config [baseline] (627.199 µs) : 0, 627
Remote Config [candidate] (595.415 µs) : 0, 595
Telemetry [baseline] (8.839 ms) : 0, 8839
Telemetry [candidate] (8.574 ms) : 0, 8574
Flare Poller [baseline] (4.339 ms) : 0, 4339
Flare Poller [candidate] (4.246 ms) : 0, 4246
section profiling
crashtracking [baseline] (1.429 ms) : 0, 1429
crashtracking [candidate] (1.446 ms) : 0, 1446
BytebuddyAgent [baseline] (722.465 ms) : 0, 722465
BytebuddyAgent [candidate] (727.042 ms) : 0, 727042
GlobalTracer [baseline] (217.674 ms) : 0, 217674
GlobalTracer [candidate] (220.078 ms) : 0, 220078
AppSec [baseline] (32.385 ms) : 0, 32385
AppSec [candidate] (32.711 ms) : 0, 32711
Debugger [baseline] (6.551 ms) : 0, 6551
Debugger [candidate] (8.414 ms) : 0, 8414
Remote Config [baseline] (767.579 µs) : 0, 768
Remote Config [candidate] (720.467 µs) : 0, 720
Telemetry [baseline] (15.198 ms) : 0, 15198
Telemetry [candidate] (14.617 ms) : 0, 14617
Flare Poller [baseline] (4.866 ms) : 0, 4866
Flare Poller [candidate] (4.194 ms) : 0, 4194
ProfilingAgent [baseline] (108.991 ms) : 0, 108991
ProfilingAgent [candidate] (109.681 ms) : 0, 109681
Profiling [baseline] (109.743 ms) : 0, 109743
Profiling [candidate] (110.294 ms) : 0, 110294
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.024 s) : 0, 1024233
Total [baseline] (8.667 s) : 0, 8666831
Agent [candidate] (1.027 s) : 0, 1026688
Total [candidate] (8.705 s) : 0, 8704840
section iast
Agent [baseline] (1.152 s) : 0, 1152209
Total [baseline] (9.296 s) : 0, 9296380
Agent [candidate] (1.154 s) : 0, 1154356
Total [candidate] (9.304 s) : 0, 9304436
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.024 s -
Agent iast 1.152 s 127.976 ms (12.5%)
Total tracing 8.667 s -
Total iast 9.296 s 629.549 ms (7.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.027 s -
Agent iast 1.154 s 127.668 ms (12.4%)
Total tracing 8.705 s -
Total iast 9.304 s 599.596 ms (6.9%)
gantt
    title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.479 ms) : 0, 1479
crashtracking [candidate] (1.478 ms) : 0, 1478
BytebuddyAgent [baseline] (697.618 ms) : 0, 697618
BytebuddyAgent [candidate] (698.59 ms) : 0, 698590
GlobalTracer [baseline] (242.764 ms) : 0, 242764
GlobalTracer [candidate] (244.791 ms) : 0, 244791
AppSec [baseline] (32.832 ms) : 0, 32832
AppSec [candidate] (32.915 ms) : 0, 32915
Debugger [baseline] (6.555 ms) : 0, 6555
Debugger [candidate] (6.548 ms) : 0, 6548
Remote Config [baseline] (700.726 µs) : 0, 701
Remote Config [candidate] (695.095 µs) : 0, 695
Telemetry [baseline] (9.398 ms) : 0, 9398
Telemetry [candidate] (9.542 ms) : 0, 9542
Flare Poller [baseline] (11.702 ms) : 0, 11702
Flare Poller [candidate] (10.947 ms) : 0, 10947
section iast
crashtracking [baseline] (1.489 ms) : 0, 1489
crashtracking [candidate] (1.486 ms) : 0, 1486
BytebuddyAgent [baseline] (816.557 ms) : 0, 816557
BytebuddyAgent [candidate] (816.567 ms) : 0, 816567
GlobalTracer [baseline] (231.763 ms) : 0, 231763
GlobalTracer [candidate] (232.836 ms) : 0, 232836
IAST [baseline] (26.323 ms) : 0, 26323
IAST [candidate] (26.713 ms) : 0, 26713
AppSec [baseline] (34.999 ms) : 0, 34999
AppSec [candidate] (35.38 ms) : 0, 35380
Debugger [baseline] (6.094 ms) : 0, 6094
Debugger [candidate] (6.207 ms) : 0, 6207
Remote Config [baseline] (614.057 µs) : 0, 614
Remote Config [candidate] (597.224 µs) : 0, 597
Telemetry [baseline] (8.678 ms) : 0, 8678
Telemetry [candidate] (8.787 ms) : 0, 8787
Flare Poller [baseline] (4.219 ms) : 0, 4219
Flare Poller [candidate] (4.282 ms) : 0, 4282
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master raphael/memfd
git_commit_date 1760635008 1760636118
git_commit_sha c85d09f fa05b36
release_version 1.55.0-SNAPSHOT~c85d09f004 1.55.0-SNAPSHOT~fa05b36659
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1760637527 1760637527
ci_job_id 1183514614 1183514614
ci_pipeline_id 79539409 79539409
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-d7em4yj2 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-d7em4yj2 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 1 performance regressions! Performance is the same for 9 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:iast:high_load better
[-1154.395µs; -782.723µs] or [-10.957%; -7.429%]
unstable
[-8.687op/s; +97.375op/s] or [-1.969%; +22.076%]
9.567ms 485.438op/s 10.536ms 441.094op/s
scenario:load:insecure-bank:profiling:high_load better
[-623.306µs; -308.212µs] or [-6.766%; -3.346%]
unstable
[-41.402op/s; +94.839op/s] or [-8.219%; +18.828%]
8.746ms 530.438op/s 9.212ms 503.719op/s
scenario:load:petclinic:profiling:high_load worse
[+1.183ms; +2.232ms] or [+2.485%; +4.690%]
unstable
[-10.249op/s; +3.424op/s] or [-10.421%; +3.481%]
49.301ms 94.938op/s 47.593ms 98.350op/s
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.177 ms) : 4127, 4227
.   : milestone, 4177,
iast (10.536 ms) : 10351, 10720
.   : milestone, 10536,
iast_FULL (14.526 ms) : 14234, 14818
.   : milestone, 14526,
iast_GLOBAL (10.528 ms) : 10338, 10717
.   : milestone, 10528,
profiling (9.212 ms) : 9056, 9368
.   : milestone, 9212,
tracing (7.73 ms) : 7614, 7845
.   : milestone, 7730,
section candidate
no_agent (4.158 ms) : 4108, 4207
.   : milestone, 4158,
iast (9.567 ms) : 9407, 9727
.   : milestone, 9567,
iast_FULL (14.303 ms) : 14022, 14585
.   : milestone, 14303,
iast_GLOBAL (10.822 ms) : 10628, 11015
.   : milestone, 10822,
profiling (8.746 ms) : 8610, 8882
.   : milestone, 8746,
tracing (7.557 ms) : 7442, 7672
.   : milestone, 7557,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.177 ms [4.127 ms, 4.227 ms] -
iast 10.536 ms [10.351 ms, 10.72 ms] 6.359 ms (152.2%)
iast_FULL 14.526 ms [14.234 ms, 14.818 ms] 10.349 ms (247.8%)
iast_GLOBAL 10.528 ms [10.338 ms, 10.717 ms] 6.351 ms (152.0%)
profiling 9.212 ms [9.056 ms, 9.368 ms] 5.035 ms (120.5%)
tracing 7.73 ms [7.614 ms, 7.845 ms] 3.553 ms (85.1%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.158 ms [4.108 ms, 4.207 ms] -
iast 9.567 ms [9.407 ms, 9.727 ms] 5.409 ms (130.1%)
iast_FULL 14.303 ms [14.022 ms, 14.585 ms] 10.146 ms (244.0%)
iast_GLOBAL 10.822 ms [10.628 ms, 11.015 ms] 6.664 ms (160.3%)
profiling 8.746 ms [8.61 ms, 8.882 ms] 4.588 ms (110.4%)
tracing 7.557 ms [7.442 ms, 7.672 ms] 3.399 ms (81.8%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.264 ms) : 36962, 37566
.   : milestone, 37264,
appsec (48.457 ms) : 48010, 48904
.   : milestone, 48457,
code_origins (45.474 ms) : 45069, 45878
.   : milestone, 45474,
iast (46.059 ms) : 45666, 46452
.   : milestone, 46059,
profiling (47.593 ms) : 47128, 48059
.   : milestone, 47593,
tracing (45.169 ms) : 44781, 45557
.   : milestone, 45169,
section candidate
no_agent (36.541 ms) : 36242, 36840
.   : milestone, 36541,
appsec (48.846 ms) : 48409, 49282
.   : milestone, 48846,
code_origins (44.303 ms) : 43925, 44681
.   : milestone, 44303,
iast (44.937 ms) : 44556, 45318
.   : milestone, 44937,
profiling (49.301 ms) : 48792, 49810
.   : milestone, 49301,
tracing (44.695 ms) : 44316, 45075
.   : milestone, 44695,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.264 ms [36.962 ms, 37.566 ms] -
appsec 48.457 ms [48.01 ms, 48.904 ms] 11.193 ms (30.0%)
code_origins 45.474 ms [45.069 ms, 45.878 ms] 8.209 ms (22.0%)
iast 46.059 ms [45.666 ms, 46.452 ms] 8.795 ms (23.6%)
profiling 47.593 ms [47.128 ms, 48.059 ms] 10.329 ms (27.7%)
tracing 45.169 ms [44.781 ms, 45.557 ms] 7.905 ms (21.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.541 ms [36.242 ms, 36.84 ms] -
appsec 48.846 ms [48.409 ms, 49.282 ms] 12.304 ms (33.7%)
code_origins 44.303 ms [43.925 ms, 44.681 ms] 7.762 ms (21.2%)
iast 44.937 ms [44.556 ms, 45.318 ms] 8.396 ms (23.0%)
profiling 49.301 ms [48.792 ms, 49.81 ms] 12.76 ms (34.9%)
tracing 44.695 ms [44.316 ms, 45.075 ms] 8.154 ms (22.3%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master raphael/memfd
git_commit_date 1760635008 1760636118
git_commit_sha c85d09f fa05b36
release_version 1.55.0-SNAPSHOT~c85d09f004 1.55.0-SNAPSHOT~fa05b36659
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1760638067 1760638067
ci_job_id 1183514615 1183514615
ci_pipeline_id 79539409 79539409
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-jfw5wngz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-jfw5wngz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.476 ms) : 1465, 1488
.   : milestone, 1476,
appsec (3.685 ms) : 3468, 3902
.   : milestone, 3685,
iast (2.21 ms) : 2147, 2274
.   : milestone, 2210,
iast_GLOBAL (2.254 ms) : 2190, 2318
.   : milestone, 2254,
profiling (2.052 ms) : 2000, 2103
.   : milestone, 2052,
tracing (2.032 ms) : 1982, 2082
.   : milestone, 2032,
section candidate
no_agent (1.473 ms) : 1462, 1485
.   : milestone, 1473,
appsec (3.693 ms) : 3475, 3911
.   : milestone, 3693,
iast (2.199 ms) : 2135, 2262
.   : milestone, 2199,
iast_GLOBAL (2.242 ms) : 2178, 2306
.   : milestone, 2242,
profiling (2.047 ms) : 1996, 2098
.   : milestone, 2047,
tracing (2.016 ms) : 1967, 2065
.   : milestone, 2016,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.476 ms [1.465 ms, 1.488 ms] -
appsec 3.685 ms [3.468 ms, 3.902 ms] 2.209 ms (149.6%)
iast 2.21 ms [2.147 ms, 2.274 ms] 734.075 µs (49.7%)
iast_GLOBAL 2.254 ms [2.19 ms, 2.318 ms] 777.759 µs (52.7%)
profiling 2.052 ms [2.0 ms, 2.103 ms] 575.539 µs (39.0%)
tracing 2.032 ms [1.982 ms, 2.082 ms] 555.937 µs (37.7%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.473 ms [1.462 ms, 1.485 ms] -
appsec 3.693 ms [3.475 ms, 3.911 ms] 2.22 ms (150.7%)
iast 2.199 ms [2.135 ms, 2.262 ms] 725.459 µs (49.2%)
iast_GLOBAL 2.242 ms [2.178 ms, 2.306 ms] 768.986 µs (52.2%)
profiling 2.047 ms [1.996 ms, 2.098 ms] 574.127 µs (39.0%)
tracing 2.016 ms [1.967 ms, 2.065 ms] 543.356 µs (36.9%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.945 s) : 14945000, 14945000
.   : milestone, 14945000,
appsec (15.037 s) : 15037000, 15037000
.   : milestone, 15037000,
iast (18.606 s) : 18606000, 18606000
.   : milestone, 18606000,
iast_GLOBAL (18.166 s) : 18166000, 18166000
.   : milestone, 18166000,
profiling (15.626 s) : 15626000, 15626000
.   : milestone, 15626000,
tracing (15.139 s) : 15139000, 15139000
.   : milestone, 15139000,
section candidate
no_agent (14.913 s) : 14913000, 14913000
.   : milestone, 14913000,
appsec (14.875 s) : 14875000, 14875000
.   : milestone, 14875000,
iast (18.703 s) : 18703000, 18703000
.   : milestone, 18703000,
iast_GLOBAL (18.047 s) : 18047000, 18047000
.   : milestone, 18047000,
profiling (15.263 s) : 15263000, 15263000
.   : milestone, 15263000,
tracing (15.07 s) : 15070000, 15070000
.   : milestone, 15070000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.945 s [14.945 s, 14.945 s] -
appsec 15.037 s [15.037 s, 15.037 s] 92.0 ms (0.6%)
iast 18.606 s [18.606 s, 18.606 s] 3.661 s (24.5%)
iast_GLOBAL 18.166 s [18.166 s, 18.166 s] 3.221 s (21.6%)
profiling 15.626 s [15.626 s, 15.626 s] 681.0 ms (4.6%)
tracing 15.139 s [15.139 s, 15.139 s] 194.0 ms (1.3%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.913 s [14.913 s, 14.913 s] -
appsec 14.875 s [14.875 s, 14.875 s] -38.0 ms (-0.3%)
iast 18.703 s [18.703 s, 18.703 s] 3.79 s (25.4%)
iast_GLOBAL 18.047 s [18.047 s, 18.047 s] 3.134 s (21.0%)
profiling 15.263 s [15.263 s, 15.263 s] 350.0 ms (2.3%)
tracing 15.07 s [15.07 s, 15.07 s] 157.0 ms (1.1%)

@raphaelgavache raphaelgavache force-pushed the raphael/memfd branch 3 times, most recently from d8eb447 to 65ffa65 Compare October 8, 2025 22:09
@bm1549
Copy link
Contributor

bm1549 commented Oct 9, 2025

@raphaelgavache iiuc GraalVM and Spring Native are not expected to work because of the way native libraries work with native image. So long as it won't crash or cause other adverse behavior in those scenarios, I think we're good

@dougqh
Copy link
Contributor

dougqh commented Oct 9, 2025

@raphaelgavache iiuc GraalVM and Spring Native are not expected to work because of the way native libraries work with native image. So long as it won't crash or cause other adverse behavior in those scenarios, I think we're good

Yes, Graal native image support is awhile off for any native library usage.
In this case, the problem is JNA itself because JNA's dynamic class generation doesn't work well with the AoT approach of Graal native image.

public class ServiceDiscovery {
private static final Logger log = LoggerFactory.getLogger(ServiceDiscovery.class);

private static final byte[] SCHEMA_VERSION = "schema_version".getBytes(ISO_8859_1);
Copy link
Contributor

@dougqh dougqh Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular case, I'd rather not have the constants.
The problem is that they end up permanently consuming memory when we really only intend to use them once.

Admittedly, this is a weird case and it has got me thinking about whether we want to just unload the class, but that's on platform to figure out.

TracerVersion.TRACER_VERSION,
config.getHostName(),
config.getRuntimeId(),
config.getServiceName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the statically configured service name which isn't necessarily the service name that the tracer will use in the end. Are we okay with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to consider moving this to a background task, so it doesn't impact start-up.

mapElements += (processTags != null && processTags.length() > 0) ? 1 : 0;
mapElements += (containerID != null && !containerID.isEmpty()) ? 1 : 0;

SimpleUtf8Cache encodingCache = new SimpleUtf8Cache(256);
Copy link
Contributor

@dougqh dougqh Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip the cache in this case.
Since the code is creating a new cache each time, the cache isn't providing a benefit here. And the cache is actually slower than just doing the encoding directly, the cache is helpful in rducing allocation if we're repeatedly encoding again and again.

But since this code is only called a handful of times, there's not much chance to save on allocation.

@amarziali
Copy link
Contributor

I think that the write should be done asynchronously (not on the premain) since the startup time skyrocketed. Ideally also the related classloading can be deferred on a scheduled task to be done after the tracer started

@raphaelgavache raphaelgavache marked this pull request as ready for review October 14, 2025 12:52
@raphaelgavache raphaelgavache requested a review from a team as a code owner October 14, 2025 12:52
@github-actions
Copy link
Contributor

github-actions bot commented Oct 14, 2025

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

static final boolean DEFAULT_TELEMETRY_LOG_COLLECTION_ENABLED = true;
static final int DEFAULT_TELEMETRY_DEPENDENCY_RESOLUTION_QUEUE_SIZE = 100000;

static final boolean DEFAULT_SERVICE_DISCOVERY_ENABLED = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏Are there some other products doing JNA at startup by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about the JNA part but the memfd is enabled by default on all tracers except java for multiple months now. Agent products consume it

tagInterceptor,
strictTraceWrites,
instrumentationGateway,
null, // you might refactor this as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏left over? Have a noop instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just removed the comment, I'm not sure about the difference between null or noop class and where's a good example to pick from,
so I stayed with null, but can revisit it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoOp class will probably be the better option. Especially if you use classes rather than interfaces.
With classes, class hierarchy analysis optimizations will kick-in, so after JIT-ing, you still get a direct call with no type checks before the call.

But the more important thing is just to make the whole thing async

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 notes: I just realised I did not submit the main comment about my review 😅
Cleaning up the initialization was one of the reasons to use a Supplier in the first place

After Doug comment, serviceDiscoveryFactory() should even return the NOOP rather than null.


maybeEnableServiceDiscovery(tracerBuilder);

installGlobalTracer(tracerBuilder.build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
installGlobalTracer(tracerBuilder.build());
.pollForTracingConfiguration()
.serviceDiscoveryFactory(TracerInstaller::serviceDiscoveryFactory)
.build();
installGlobalTracer(tracer);

where I would directly use a method reference to create the MemFDUnixWriter which can return null if the feature is disabled or unavailable.

@SuppressForbidden // intentional use of Class.forName
  private static ServiceDiscoveryFactory serviceDiscoveryFactory() {
    if (!Config.get().isServiceDiscoveryEnabled()) {
      return null;
    }
    if (!OperatingSystem.isLinux()) {
      log.debug("service discovery not supported outside linux");
      return null;
    }
    // make sure this branch is not considered possible for graalvm artifact
    if (Platform.isNativeImageBuilder() || Platform.isNativeImage()) {
      log.debug("service discovery not supported on native images");
      return null;
    }
    
          try {
            // use reflection to load MemFDUnixWriter so it doesn't get picked up when we
            // transitively look for all tracer class dependencies to install in GraalVM via
            // VMRuntimeInstrumentation
            Class<?> memFdClass =
                Class.forName("datadog.trace.agent.tooling.servicediscovery.MemFDUnixWriter");
            ForeignMemoryWriter memFd =
                (ForeignMemoryWriter) memFdClass.getConstructor().newInstance();
            return new ServiceDiscovery(memFd);
          } catch (Throwable e) {
            log.debug("service discovery not supported", e);
            return null;
          }
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with factory patterns and can't get this to compile I'm sorry, would it be possible for you to commit directly what you have in mind here please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do later this week 👌

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of introducing a method reference and always calling serviceDiscoveryFactory?

I could imagine you might want to defer all service-discovery logic, but IMHO the current approach is more readable and skips setting any factory when we know up-front that it's not possible or applicable.

Also note that if you do set a serviceDiscoveryFactory then that will always trigger a call in CoreTracer to schedule an async task via AgentTaskScheduler to evaluate that factory. On platforms where we know up-front that service discovery is not possible/applicable this will lead to an unnecessary call to schedule a task which will do nothing.

Given this I would leave this code as it is today.

Copy link
Contributor

@amarziali amarziali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature wise looks good to me. The implementation is not affecting performance indicators and can be opted-out.
@raphaelgavache can you please add deactivation instructions to the PR description?
Also, there are rooms for improvements raised by @PerfectSlayer that should be considered before merging this or addressed in a followup pr.

return instrumenterConfig.isTraceEnabled();
}

public boolean isServiceDiscoveryEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding a feature-flag to control this!

Copy link
Contributor

@mcculls mcculls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for putting this together

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed factory refactoring


NativeLong written = libc.write(memFd, buf, new NativeLong(payload.length));
if (written.longValue() != payload.length) {
log.warn("write to memfd failed errno={}", Native.getLastError());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏Should we clear the memfd if write failed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is safe to have it partially written and open, so I think the less libc interaction the better

public void write(byte[] payload) {
final LibC libc = Native.load("c", LibC.class);

int memFd = libc.memfd_create("datadog-tracer-info", MFD_CLOEXEC | MFD_ALLOW_SEALING);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realised the agent implementation contrary to system tests matches on an additional -
adding it to the file name

@raphaelgavache raphaelgavache merged commit 33e27c7 into master Oct 16, 2025
534 checks passed
@raphaelgavache raphaelgavache deleted the raphael/memfd branch October 16, 2025 18:24
@github-actions github-actions bot added this to the 1.55.0 milestone Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants